Bioinformatics Data Skills
Utah Valley University - BIOL490R (Special Topics)
Handy links:
(Anyone with link can edit)
Command Line Projects and the Unix Philosophy
Week 1
Topics:
- What are “data skills?” | Reproducibility and open science | How to learn bioinformatics | Documentation | The importance of caution
Assignments:
- Read through BDS Chapter 1… twice, and carefully
- Find and explore the supplemental materials for the chapter on GitHub
- Go through the resources below (Do this every week before class!)
- Assignment 1 - Reflection piece on why you want to learn command line skills and best practices
- Set up your computer environment (Command-line, Git)
Resources
Practice
For your consideration:
- “Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.” –Brian Kernighan
- “Since the computer is a sharp enough tool to be really useful, you can cut yourself on it.” – John Tukey
Back to top of page
Week 2
Proper Project Organization
Topics:
- One directory per project | data as ‘read-only’ | rules for naming things | project structure | documentation
Assignments:
- Read through BDS Chapter 2 at least once
- Work through BDS Chapter 2, following along in your own terminal
- Assignment 2 - Create oganized project template using code
Resources
Practice
- Re-create your project directory template by copy-pasting each line of code from your assignment to make sure it gives the same result
- Spend time making sure that you intuitively understand relative filepaths and get comfy with the terminal
- Spend 2-3 hours mucking about in your terminal reworking the lines from Chapter 2 over and over until it feels normal
For your consideration:
- If you are learning to play the piano, and you settle for a couple hours a week of instruction without practicing on your own, you’re gonna be a really crappy piano player, like me. –Geoff Zahn
Back to top of page
Unix refresher and sequence data types
Week 3
The Unix Shell
Topics:
- The Unix philosophy | text streams | pipes and redirection | process control | process substitution
Assignments:
- Read through BDS Chapter 3
- Work through BDS Chapter 3, following along in your own terminal
- Assignment 3 - Running shell scripts, redirecting, pipes, background processes
- Read/watch ALL of the resouces below. Be able to write a for-loop.
Resources
Practice
For your consideration:
- “This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.” –Doug McIlroy
Back to top of page
Week 4
Working with Sequence Data
Topics
- fasta and fastq file formats | using existing tools to work with sequence data
Assignments:
- Read through BDS Chapter 10 at least once
- Don’t work through the examples yet (we can return to them once we have more skills)
- Assignment 4 - converting between formats, inspecting and trimming reads, using pre-made command-line tools
Resources
Practice
For your consideration:
- “Treat data as read-only.” –Vince Buffalo
- Never directly edit any fasta or fastq file! If you have to make edits, redirect them to a new version of the raw file.
Back to top of page
Week 5
Combining Unix Skills and Command-Line Software
Topics:
- Interfacing with command-line tools | redirecting stdout and stderr | customizing parameters
Assignments:
Resources
Back to top of page
Week 6
Topics:
Assignments:
Resources
- Introduction to regular expressions video
- sed video playlist Definitely worth your time!
Practice
Back to top of page
Week 7
Topics:
- More handy shell programs: cut, paste, sort, uniq, tr, rename, tee, xargs, awk
- Manipulating text data from one format to another
Assignments:
- Continue working through BDS Chapter 7
- Assignment 5 - convert between tabular and fasta formatted data | process/command substitution
Resources
- “Process substitution” vs “command substitution” VIDEO
Practice
- Here’s an awful-looking one-line command that prints out the phylum from each line of Chapter_7_Practice_File_2.txt along with a number sequence next to it showing which line of the file it came from.
- It uses both process and command substitution, but essentially, it’s just the paste command pasting together the phylum in the first field and the numbers 1-34 in the second field
I want you to break it apart, looking at each component and understand why it works!
paste <(cat Chapter_7_Practice_File_2.txt | cut -d ";" -f 2) <(seq $(wc -l Chapter_7_Practice_File_2.txt | cut -d " " -f 1))
If you wanted to use process substitution again to extend this whole command in order to add a header to the output, what would you do? (i.e., add a first row that is “PHYLUM LINE_NUMBER”)
Back to top of page
Finding and Retrieving Data
Week 8
Online Repositories and Approaches to Downloading
Topics:
- NCBI / SRA
- Searches, filters, metadata
- Database files and formats
- Documenting data acquisition
- Checksums
- File compression
Assignments:
Resources
Practice
Back to top of page
Working with Supercomputers
Week 9
Interfacing with Remote Machines
Topics:
Assignments:
- Work through BDS Chapter 4 before class this week
- Assignment 7 - build 3 separate SLURM scripts to run fasta analyses
Resources
Practice
Back to top of page
Week 10
Interfacing with Remote Machines, Continued
Topics:
Assignments:
Resources
Practice
Back to top of page
Version Control and Collaborations
Week 11
Git for Scientists
Topics:
- Git workflow
- GitHub
- Collaborating with Git
Assignments:
- Work through BDS Chapter 5
- Assignment 8 - Git collaboration and merge
- Group effort: Everyone (in turn) make changes to this repository
Resources
Practice
Back to top of page
Week 12
Topics:
Assignments:
- Work through BDS Chapter 12
- Assignment 9 - Git collaboration and merge-
Resources
Practice
- In-class collaborative name list
Back to top of page
Putting it all together
Week 13
Composing Full Pipelines
Topics:
Assignments:
- Continue working through BDS Chapter 12
Resources
Practice
Back to top of page
Week 14
Running a Pipeline on a Remote Machine
Topics:
Assignments:
Resources
Practice
Back to top of page
Week 15
Topics:
- Testing with toy examples
Assignments:
Back to top of page
Week 16
Where to go from here?
Topics:
Assignments:
- Assignment 10 - Reflection piece on what you’ve learned and what next steps you’ll take
Back to top of page